Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based classification to improve bladder tissue classification when annotations are limited in multi-domain data.
translated by 谷歌翻译
DeeProb-kit is a unified library written in Python consisting of a collection of deep probabilistic models (DPMs) that are tractable and exact representations for the modelled probability distributions. The availability of a representative selection of DPMs in a single library makes it possible to combine them in a straightforward manner, a common practice in deep learning research nowadays. In addition, it includes efficiently implemented learning techniques, inference routines, statistical algorithms, and provides high-quality fully-documented APIs. The development of DeeProb-kit will help the community to accelerate research on DPMs as well as to standardise their evaluation and better understand how they are related based on their expressivity.
translated by 谷歌翻译
In this paper, we introduce MINTIME, a video deepfake detection approach that captures spatial and temporal anomalies and handles instances of multiple people in the same video and variations in face sizes. Previous approaches disregard such information either by using simple a-posteriori aggregation schemes, i.e., average or max operation, or using only one identity for the inference, i.e., the largest one. On the contrary, the proposed approach builds on a Spatio-Temporal TimeSformer combined with a Convolutional Neural Network backbone to capture spatio-temporal anomalies from the face sequences of multiple identities depicted in a video. This is achieved through an Identity-aware Attention mechanism that attends to each face sequence independently based on a masking operation and facilitates video-level aggregation. In addition, two novel embeddings are employed: (i) the Temporal Coherent Positional Embedding that encodes each face sequence's temporal information and (ii) the Size Embedding that encodes the size of the faces as a ratio to the video frame size. These extensions allow our system to adapt particularly well in the wild by learning how to aggregate information of multiple identities, which is usually disregarded by other methods in the literature. It achieves state-of-the-art results on the ForgeryNet dataset with an improvement of up to 14% AUC in videos containing multiple people and demonstrates ample generalization capabilities in cross-forgery and cross-dataset settings. The code is publicly available at https://github.com/davide-coccomini/MINTIME-Multi-Identity-size-iNvariant-TIMEsformer-for-Video-Deepfake-Detection.
translated by 谷歌翻译
基于连续的潜在空间(例如变异自动编码器)的概率模型可以理解为无数混合模型,其中组件连续取决于潜在代码。它们具有用于生成和概率建模的表达性工具,但与可牵引的概率推断不符,即计算代表概率分布的边际和条件。同时,可以将概率模型(例如概率电路(PC))理解为层次离散混合模型,从而使它们可以执行精确的推断,但是与连续的潜在空间模型相比,它们通常显示出低于标准的性能。在本文中,我们研究了一种混合方法,即具有较小潜在尺寸的可拖动模型的连续混合物。尽管这些模型在分析上是棘手的,但基于一组有限的集成点,它们非常适合数值集成方案。有足够数量的集成点,近似值变得精确。此外,使用一组有限的集成点,可以将近似方法编译成PC中,以“在近似模型中的精确推断”执行。在实验中,我们表明这种简单的方案被证明非常有效,因为PC在许多标准密度估计基准上以这种方式为可拖动模型设定了新的最新模型。
translated by 谷歌翻译
由于监视摄像头网络的无处不在,从图像中计算的自动人士最近引起了现代智能城市的城市监测的注意。当前的计算机视觉技术依赖于基于深度学习的算法,这些算法估算了静止图像中的行人密度。只有一堆作品利用视频序列中的时间一致性。在这项工作中,我们提出了一个时空的细心神经网络,以估计监视视频中的行人数量。通过利用连续帧之间的时间相关性,我们在广泛使用的FDST基准上将最新的计数误差降低了5%,定位误差降低了7.5%。
translated by 谷歌翻译
深度神经网络的学习算法通常基于有误后传播(BackProp)的监督端到端随机梯度下降(SGD)培训。 Backprop算法需要大量标记的训练样本才能获得高性能。但是,在许多现实的应用中,即使有很多图像样本,很少有标签被标记,并且必须使用半监督的样品培训策略。 Hebbian学习代表了一种可能采取样本培训的方法;但是,在当前解决方案中,它不能很好地扩展到大型数据集。在本文中,我们提出了FastheBB,这是HEBBIAN学习的有效且可扩展的解决方案,通过1)合并在一批输入上更新计算和聚集,以及2)利用有效的GPU上的有效矩阵乘法算法。在半监督的学习方案中,我们在不同的计算机视觉基准测试方面验证了我们的方法。 FastheBB在训练速度方面最多优于先前的解决方案,尤其是,我们首次能够将HEBBIAN算法带入ImageNet量表。
translated by 谷歌翻译
深层生成技术正在快速发展,使创建现实的操纵图像和视频并危及现代社会的宁静成为可能。新技术的持续出现带来了一个要面对的另一个问题,即DeepFake检测模型及时更新自己的能力,以便能够使用最新方法识别进行的操作。这是一个非常复杂的问题,因为训练一个模型需要大量数据,如果深层生成方法过于最近,这很难获得。此外,不断地重新训练网络是不可行的。在本文中,我们问自己,在各种深度学习技术中,是否有一个能够概括深层的概念,以至于它不会与培训中使用的一种或多种或多种特定的深层捕获方法息息相关。放。我们将视觉变压器与基于伪造网络数据集的跨性别环境中的有效NETV2进行了比较。从我们的实验中,有效的NETV2具有更大的专业趋势,通常会在训练方法上获得更好的结果,而视觉变压器具有卓越的概括能力,即使在使用新方法生成的图像上也使它们更有能力。
translated by 谷歌翻译
本文介绍了2021年进行的一系列教育事件的方法和结果,该活动利用机器人群来教育高中生和大学生有关流行病学模型以及如何为社会和政府政策提供信息。这些事件特别关注Covid-19的大流行,由4个在线和3个面对面的研讨会组成,学生有机会与一群20个定制的Brushbots互动 - 针对优化的小规模振动驱动的机器人便携性和鲁棒性。通过对事后调查中收集的数据的分析,本文展示了这些事件如何对学生对指导现实世界决策的科学方法的看法产生积极影响,以及他们对机器人技术的兴趣。
translated by 谷歌翻译
在对多机器人系统的约束驱动控制的背景下,在本文中,我们提出了一个基于优化的框架,该框架能够确保机器人团队的韧性和能源意识。该方法基于一种新颖的,框架理论的弹性度量,使我们能够分析和执行多机器人系统的弹性行为。弹性和能量意识的属性被编码为凸优化程序的约束,该程序用于合成机器人控制输入。这允许将此类属性与执行协调任务相结合,以实现弹性和能源感知机器人操作。在模拟场景中说明了所提出方法的有效性,其中部署了一个机器人团队以执行受能量和弹性约束的两个任务。
translated by 谷歌翻译
虽然卷积神经网络(CNNS)在许多愿景任务中显示出显着的结果,但它们仍然是通过简单但具有挑战性的视觉推理问题所紧张的。在计算机视觉中最近的变压器网络成功的启发,在本文中,我们介绍了经常性视觉变压器(RVIT)模型。由于经常性连接和空间注意在推理任务中的影响,该网络实现了来自SVRT数据集的同样不同视觉推理问题的竞争结果。空间和深度尺寸中的重量共享正规化模型,允许它使用较少的自由参数学习,仅使用28K培训样本。全面的消融研究证实了混合CNN +变压器架构的重要性和反馈连接的作用,其迭代地细化内部表示直到获得稳定的预测。最后,本研究可以更深入地了解对求解视觉抽象推理任务的注意力和经常性联系的作用。
translated by 谷歌翻译